# Temporal Localization
Videomind 2B
Bsd-3-clause
VideoMind is a multimodal agent framework that enhances video reasoning capabilities by simulating human thought processes (such as task decomposition, moment localization & verification, and answer synthesis).
Video-to-Text
V
yeliudev
207
1
Cogvlm2 Video Llama3 Chat
Other
CogVLM2-Video is a high-performance video understanding model that achieves state-of-the-art performance in multiple video question-answering tasks, capable of completing video understanding within one minute.
Text-to-Video
Transformers English

C
THUDM
2,384
48
Featured Recommended AI Models